Activity Detection with Latent Sub-event Hierarchy Learning
نویسندگان
چکیده
In this paper, we introduce a new convolutional layer named the Temporal Gaussian Mixture (TGM) layer and present how it can be used to efficiently capture temporal structure in continuous activity videos. Our layer is designed to allow the model to learn a latent hierarchy of sub-event intervals. Our approach is fully differentiable while relying on a significantly less number of parameters, enabling its end-to-end training with standard backpropagation. We present our convolutional video models with multiple TGM layers for activity detection. Our experiments on multiple datasets including Charades and MultiTHUMOS confirm the benefit of our TGM layers, illustrating that it outperforms other models and temporal convolutions.
منابع مشابه
Learning Latent Super-Events to Detect Multiple Activities in Videos
In this paper, we introduce the concept of learning latent super-events from activity videos, and present how it benefits activity detection in continuous videos. We define a super-event as a set of multiple events occurring together in videos with a particular temporal organization; it is the opposite concept of sub-events. Real-world videos contain multiple activities and are rarely segmented...
متن کاملBeyond Novelty Detection: Incongruent Events, when General and Specific Classifiers Disagree
Unexpected stimuli are a challenge to any machine learning algorithm. Here we identify distinct types of unexpected events, focusing on ’incongruent events’ when ’general level’ and ’specific level’ classifiers give conflicting predictions. We define a formal framework for the representation and processing of incongruent events: starting from the notion of label hierarchy, we show how partial o...
متن کاملMultiple Agent Event Detection and Representation in Videos
We propose a novel method to detect events involving multiple agents in a video and to learn their structure in terms of temporally related chain of sub-events. The proposed method has three significant contributions over existing frameworks. First, we present the concept of a video event graph, to learn the event structure from training videos. The video event graph is composed of temporally c...
متن کاملLearning Latent Activities from Social Signals with Hierarchical Dirichlet Process
Understanding human activities is an important research topic, noticeably in assisted living and health monitoring. Beyond simple forms of activity (e.g., an RFID event of entering a building), learning latent activities that are more semantically interpretable, such as sitting at a desk, meeting with people or gathering with friends, remains a challenging problem. Supervised learning has been ...
متن کاملLearning Latent Activities from Social Signals with Hierarchical Dirichlet Processes
Understanding human activities is an important research topic, noticeably in assisted living and health monitoring. Beyond simple forms of activity (e.g., an RFID event of entering a building), learning latent activities that are more semantically interpretable, such as sitting at a desk, meeting with people or gathering with friends, remains a challenging problem. Supervised learning has been ...
متن کامل